Interpretable Boosted Naïve Bayes Classification
نویسندگان
چکیده
Voting methods such as boosting and bagging provide substantial improvements in classification performance in many problem domains. However, the resulting predictions can prove inscrutable to end-users. This is especially problematic in domains such as medicine, where end-user acceptance often depends on the ability of a classifier to explain its reasoning. Here we propose a variant of the boosted naïve Bayes classifier that facilitates explanations while retaining predictive performance. Introduction Efforts to develop classifiers with strong discrimination power using voting methods have marginalized the importance of comprehensibility. Bauer and Kohavi [1998] state that “for learning tasks where comprehensibility is not crucial, voting methods are extremely useful.” However, as many authors have pointed out, problem domains, such as credit approval and medical diagnosis, do require interpretable as well as accurate classification methods. For instance, Swartout [1983] commented that “trust in a system is developed not only by the quality of the results but also by clear description of how they were derived. ... In addition to providing diagnoses or prescriptions, a consultant program must be able to explain what it is doing and why it is doing it.” In this note we present a boosted naïve Bayes classifier with both competitive discrimination ability and transparent reasoning. The next section provides a very brief introduction to boosting. Then we describe our proposed boosted, interpretable naïve Bayes classifier while the last section examines its performance empirically. Copyright © 1998, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. Boosting Boosting describes a general voting method for learning from a sequence of models. Observations poorly modeled by Ht receive greater weight for learning Ht+1. The final boosted model is a combination of the predictions from each Ht where each Ht is weighted according to the quality of its classification of the training data. Freund and Schapire [1995] presented a boosting algorithm that empirically has yielded reduction in bias, variance, and misclassification rates with a variety of base classifiers and problem settings. The AdaBoost (adaptive boosting) algorithm of Freund and Schapire involves the following steps. The data for this problem take on the form (X,Y)i where Yi {0,1}. Initialize the weight of each observation to N i w 1 ) 1 ( . For t in 1 to T do the following... 1. Using the weights, learn model Ht(xi) : X [0,1].
منابع مشابه
Boosting methodology for regression problems
Classification problems have dominated research on boosting to date. The application of boosting to regression problems, on the other hand, has received little investigation. In this paper we develop a new boosting method for regression problems. We cast the regression problem as a classification problem and apply an interpretable form of the boosted naïve Bayes classifier. This induces a regre...
متن کاملComparison of Decision Tree and Naïve Bayes Methods in Classification of Researcher’s Cognitive Styles in Academic Environment
In today world of internet, it is important to feedback the users based on what they demand. Moreover, one of the important tasks in data mining is classification. Today, there are several classification techniques in order to solve the classification problems like Genetic Algorithm, Decision Tree, Bayesian and others. In this article, it is attempted to classify researchers to “Expert” and “No...
متن کاملComparison of Decision Tree and Naïve Bayes Methods in Classification of Researcher’s Cognitive Styles in Academic Environment
In today world of internet, it is important to feedback the users based on what they demand. Moreover, one of the important tasks in data mining is classification. Today, there are several classification techniques in order to solve the classification problems like Genetic Algorithm, Decision Tree, Bayesian and others. In this article, it is attempted to classify researchers to “Expert” and “No...
متن کاملS3PSO: Students’ Performance Prediction Based on Particle Swarm Optimization
Nowadays, new methods are required to take advantage of the rich and extensive gold mine of data given the vast content of data particularly created by educational systems. Data mining algorithms have been used in educational systems especially e-learning systems due to the broad usage of these systems. Providing a model to predict final student results in educational course is a reason for usi...
متن کاملDocument Classification Approach Leads to a Simple, Accurate, Interpretable G Protein Coupled Receptor Classifier
1 The need for accurate, automated protein classification methods continues to increase as advances in biotechnology uncovers new proteins at a fast rate. G-protein coupled receptors (GPCRs) are a particularly difficult superfamily of proteins to classify due to the extreme diversity among its members; yet, they are an important subject in pharmacological research being the target of approximat...
متن کامل